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^ , Abstract 

In recent years, there has been a growing interest in muhiple access communication systems that 
1^ , spread their transmitted energy over very large bandwidths. These systems, which are referred to as 

ultra wide-band (UWB) systems, have various advantages over narrow-band and conventional wide- 
band systems. The importance of multiuser detection for achieving high data or low bit error rates in 
(J I these systems has already been established in several studies. This paper presents iterative ("turbo") 

multiuser detection for impulse radio (IR) UWB systems over multipath channels. While this approach 
^ ' is demonstrated for UWB signals, it can also be used in other systems that use similar types of signaling. 

^N ■ When applied to the type of signals used by UWB systems, the complexity of the proposed detector can 

^^ , be quite low. Also, two very low complexity implementations of the iterative multiuser detection scheme 

f^ ' are proposed based on Gaussian approximation and soft interference cancellation. The performance of 

oo ■ 

(^ , these detectors is assessed using simulations that demonstrate their favorable properties. 



X 



Index Terms — Ultra wide-band (UWB), impulse radio (IR), iterative multiuser detection, soft inter- 
ference cancellation. 



This research was supported in part by the National Science Foundation under Grants ANI-03-38807 and CNS-06-25637. 
^ Department of Computer Science, New York University, NY 10012, USA, e-mail : f ishler@cs . nyu . edu 

* Department of Electrical and Electronics Engineering, Bilkent University, Bilkent, Ankara 06800, Turkey, Tel: +90 (312) 
290-3139, Fax: +90 (312) 266-4192, e-mail: gezici@ee.bilkent.edu.tr 

* Department of Electrical Engineering, Princeton University, Princeton 08544, USA, Tel: (609) 258-2260, Fax: (609) 258- 

7305, e-mail: poor@princeton.edu 



I. Introduction 

In recent years, there has been a growing interest in ultra wide-band (UWB) systems, which resulted 
in the U.S. Federal Communications Commission (FCC) regulations that allow, under several restrictions, 
the widespread use of such systems. The common definition of UWB systems, which was adopted by 
the FCC as well, states that a system is a UWB system if both the absolute and the fractional bandwidths 
are large. The absolute bandwidth should be at least 0.5 GHz, while the fractional bandwidth, which 
is the signal bandwidth divided by the carrier frequency, is at least 20% [8]. UWB systems offer many 
advantages over narrow-band or conventional wide-band systems. Among these advantages are reduced 
fading margins, simple transceiver designs, low probability of detection, good anti-jam capabilities, and 
accurate positioning (see, [5], [33], [14], and references therein). The advantages of UWB technology 
have caused this technology to be considered for use as the physical layer of several applications; for 
example, the IEEE 802.15.4a wireless personal area network (WPAN) standard employs this technology 
as one of the signaling options [37]. 

There are many signaling methods for transmitting over UWB channels, and it is obvious that, 
apart from engineering difficulties, one can use any existing spread spectrum technique for transmitting 
over UWB channels [10], [32]. However, these difficulties might be quite significant, preventing the 
actual use of conventional spread-spectrum methods for transmitting over UWB channels. Consider, 
as an example, long-code direct-sequence code-division-multiple-access (DS-CDMA) systems. In these 
systems, implementing even the simplest detector, namely the matched filter detector, requires sampling 
of the received signal at least at the chip rate, which under the current regulations might be as large as 
7.5 GHz. Such sampling rates are difficult to achieve, and result in high power consumption. 

In order to overcome some of the difficulties associated with UWB signaling, impulse radio (IR) 
systems, and especially time-hopping impulse radio (TH-IR) systems have been proposed as the preferred 
modulation scheme for UWB systems [26]. In TH-IR systems, a train of short pulses is transmitted, and the 
information is usually conveyed by either the polarity or location of the transmitted pulses. In addition, in 
order to allow many users to share the same channel, an additional random (or pseudo-random) time shift, 
known to the receiver, is added to the starting point of each pulse. This way, probability of catastrophic 
collisions between two users transmitting over the same channel at the same time is significantly reduced 
[26]. 



TH-IR modulation, e.g., binary phase shift keyed (BPSK) TH-IR, to be discussed in the following 
sections, has many advantages over conventional modulation techniques. By using very short pulses, the 
transmitted energy is spread over a very large bandwidth. In addition, by using pseudo-random time 
intervals between the transmitted pulses and random pulse polarities, spectral lines and other spectral 
impairments are avoided [13]. The implementation of the receiver is usually easier for this technique 
because the channel is excited for only a fraction of the total transmission time. For example, the matched 
filter detector needs to sample the filter matched to the received pulse only at time instants when pulses 
corresponding to the user of interest arrive at the receiver. Moreover, base-band pulses are typically used 
in UWB systems, saving the need for complex frequency synchronization and tracking^ These advantages 
make TH-IR the preferred modulation scheme for transmitting over UWB channels in various applications. 
It should be noted that IR-UWB has been chosen as one of the modulation formats for the IEEE 802. 15.4a 
WPAN standard. 

It has been observed [9], [21], [27], [35] that the transmitted and received signals of TH-IR systems can 
be described by the same models used for describing the transmitted and received signals of DS-CDMA 
systems. The main difference between classical DS-CDMA signals and TH-IR signals is that TH-IR 
signals use spreading sequences whose elements belong to the ternary alphabet, i.e., { — 1, 0, +1}, instead 
of the binary alphabet, i.e., {— 1,+1}. This observation leads to the immediate conclusion that every 
multiuser detector designed for CDMA systems can be used in TH-IR systems as well. In particular, 
the optimal multiuser detector can be easily deduced from [30], and the complexity of this detector for 
systems transmitting over multipath channels is known to be exponential in the number of active users 
and the number of transmitted symbols falling within the delay spread of the channel. Linear receivers 
can be designed as well, resulting in multiuser detectors having complexity that is polynomial in the 
number of active users and the size of the observation windows used by the detector [1], [22]. 

Although the classical algorithms for multiuser detection can be used in TH-IR systems, it is evident 

that low complexity multiuser detection algorithms for systems that use generalized spreading sequences 

in general and IR systems in particular are required. These detectors should exploit the special type 

of signals TH-IR systems transmit in order to reduce the complexity of multiuser detectors. In [9], an 

iterative multiuser detector exploiting the special structure of TH-IR signals is proposed for additive white 

'it should be noted, however, that if the channel is composed of a very large number of equipower paths, then the receiver 
complexity becomes very large due to the need to sample all of them in order to achieve diversity combining. 



Gaussian noise (AWGN) channels. Iterative multiuser detectors can be designed for TH-IR systems by 
considering the TH-IR signaling structure as a concatenated coding system, where the inner code is the 
modulation and the outer code is the repetition code. Such a technique makes use of the similarity between 
TH-IR signaling and bit interleaved coded modulation (BICM), where the inner code is modulation and 
the outer code is channel coding [2], [6], [18], [36]. 

In this paper, we first present an extension of the iterative multiuser detector in [9] to more realistic 
multipath channels. Namely, we propose an iterative detector structure that combines energy from a 
number of multipath components. Although only random TH-IR systems are described in the sequel, the 
multiuser detectors presented in this paper can be applied to any other type of DS-CDMA system whose 
spreading sequences contain large fraction of zeros. As such the contribution of this paper goes beyond the 
theory of UWB systems into the theory of general DS-CDMA systems. In addition, we propose two very 
low-complexity implementations of the iterative algorithm, which are based on Gaussian approximation 
for weak interferers, and on soft interference cancellation. 

The rest of the paper is organized as follows: In Section II, the signal model that is used throughout 
the paper is described. In Section III, an iterative multiuser detector, called the pulse-symbol iterative 
detector, is presented for frequency-selective environments. Then, two novel and low-complexity imple- 
mentations of the proposed receiver are described in Section IV. In Section V, simulations demonstrating 
the performance of the proposed detector when transmitting over indoor UWB channels are presented. 
Finally, a summary and some concluding remarks are provided in Section VI. 

II. Discrete-Time Signal Model 

TH-IR systems can be modeled as DS-CDMA systems with generalized spreading sequences that 
take values from the set {—1,0,+!} [20], [12]. Therefore, a K-usei DS-CDMA synchronous system 
transmitting over a frequency-selective channel is considered in order to obtain the discrete-time signal 
model for a TH-IR systerto It is assumed that each user transmits a packet of P information symbols, and 
N denotes the processing gain of the system. In addition, the channel between each user and the receiver 
is modeled to have L taps, and h^ = [hf- ■ ■ h^] denotes the discrete time channel impulse response 
between the A;th transmitter and the receiver. Finally, s^ j = [s^q ■ ■ ■ sf jy_^] represents the spreading 

^The synchronous assumption is made for notational convenience, but as we discuss in the sequel, the proposed algorithm 
works equally well in asynchronous systems. 



sequence that the fcth user uses for spreading its ith information symbol. Note that if s^ j = s^ ,,■ for every 
i and j, then the systems is a short-code system; otherwise it is a long-code system. 

A chip-sampled discrete-time model for the received signal can be described by the following model: 

K 



r = ^V^HfcSfcbfc + n, (1) 

fc=i 

where, for the kth user (A; = 1, . . . , K): E^ is the transmitted energy per symbol; H^ is an {NP+L — 1) x 
NP matrix, whose ith column is equal to [Oj_i, h^, Oatp^j]^ and 0^ is the all zero row vector of length I; 
Sfc is an NP x P spreading matrix containing the P spreading sequences that the fcth user uses for spread- 
ing the transmitted symbols, Sfc = [[sfc^i OAr(p_i)]^, [Otv Sfc,2 0Ar(p_2)]'^, • • • , [OAr(p_i) su^pV] ; and b^ = 
[6i, . . . , hpY is the vector containing the transmitted information symbols of the kth user. Throughout 
this paper, it is assumed that the transmitted information symbols are binary (i.e., elements of { — 1, +1}) 
although the extension to more general cases is straightforward. Here, n = [ni, . . . ,n7vp+L-i]"^ is the 
sampled additive noise vector, assumed to be normally distributed with zero mean and correlation matrix 
(7^1, i.e., n ~ AA (O, cr^l). In the sequel, this system is referred to as a BPSK TH-IR system. 

Denote by b = [b^,b^, . . . ,b^]^ the vector containing the transmitted symbols of the various 



users, by S the block diagonal matrix with the users' spreading matrices on its diagonal, and by 
H = [Hi,H2, . . . ,'H-k\ the concatenation of the users' channel matrices. With the aid of H, S, and 
b, the following model for the received signal can be deduced: 

r = HSb + n. (2) 

In deriving ^, it is assumed without loss of generality that the users' channel impulse responses are 
scaled to absorb the transmitted energy per bit. 

Equation Q can also be used to describe DS-CDMA systems, in which case it is usually assumed 
that all the elements of S belong to <^ ±^^ \, where N is the spreading gain. IR systems are, in a 
sense, generalizations of DS-CDMA systems, where in IR systems all the elements of S belong to 
< =fc }- — ,0 >, where Nf is the number of pulses (or "chips" in the CDMA terminology) each user 
transmits per information symbol. Since each symbol interval in an IR system is divided into Nf equal 
intervals, called /rames, and a single pulse is transmitted in each frame, Nj is also called the number of 
frames per symbol. 



In practice each user, say the fcth user, is assigned a random, or a long pseudo-random, TH sequence, 
denoted by {c^}. This sequence is known to the receiver, but the elements of this sequence can be modeled 
for analytical purposes as independent and identically distributed (i.i.d.) random variables, uniformly 
distributed in {0, 1, . . . , A'^c — 1}. Denote by Sk = [s^ii s^2i • • • ' ^k p\ ^^^ concatenation of the spreading 
sequences of the fcth user. The elements of s^ are related to the A;th user's TH sequence as follows: the 
elements of s^ corresponding to indices {{j — l)Nc + c^ + l},=i are binary random variables, while 
the remaining elements are zero. Note that random CDMA systems can be described by this model by 
taking Nf = N. 

III. The Pulse-Symbol Iterative Detector 

In this section, a low-complexity receiver structure, called the "pulse-symbol (iterative) detector" is 
proposed for TH-IR systems in frequency selective environments. Since the receiver does not require 
chip-rate or Nyquist rate sampling, it facilitates simple implementations in the context of UWB systems. 

Denote by C^ = {l^, . . . , /|,j}, with Z^ G {1, 2, . . . , L} and M < L, the indices of the signal paths the 
receiver combines for user k. In other words, the proposed receiver samples the received signal at the 
time instances when pulses arrive through the paths indexed by £'^ for A; = 1, ... ,i^. It can be easily 
seen that these sampling times are {{{j — l)Nc + c^ + lm)'^c}j=i 'k='i m=i' where Tc is the pulse width. 
Denote by r^^ the received sample corresponding to the jth pulse of the kth user via the ?nth signal 
path. Note that the total number of samples per symbol from all frames and signal paths of all users 
can be as high as NfMK, which can result in a very high-complexity receiver structure. Therefore, we 
consider a receiver that combines the samples from different multipath components in each frame by 
maximal ratio combining (MRC) for each user. Let fj denote this combined sample in the jth frame of 
user k. Then, 

M 
m=l 

and the samples from user k can be expressed as r^ = [f\- ■ ■ f^ p] . The proposed receiver is depicted 
in Figure [U It is easy to verify that r^,^ is the {{j — \)Nc + c^ + l^Y^ element of r defined in Q, and 
therefore a matrix, G^, which performs selection and MRC of selected samples, can be designed such 
that ffc = GfcF. 



Based on the samples obtained as in ([3]), the pulse-symbol detector performs an iterative estimation of 
users' symbols. In general, iterative algorithms provide low complexity and close-to-optimal solutions for 
many problems (see, [15], [23], [31], [6], [18], among many others; a review is found in [24]). The main 
property of the problems that can be solved efficiently by iterative techniques is that these problems have 
a very special structure, which allows productive use of iterative procedures. Consider as an example the 
problem of joint multiuser detection and decoding of error correcting codes in CDMA systems [23]. In 
this problem, one can employ any multiuser detection algorithm (or more precisely a multiuser receiver 
[28]) that results in soft decision statistics about every channel symbol. These soft decisions can be 
fed into any soft decoding algorithm, and the result will be the estimated information symbol. Turbo 
based algorithms provide an efficient way of iterating between the results obtained by the two constituent 
algorithms, where each one of these algorithms is designed to solve one part of the problem. Although 
no such structure exists in the problem of multiuser detection of TH-IR signals, some of the a priori 
information can be neglected in order to impose a structure suitable for an iterative decoding algorithm. 
In other words, the spreading operation is regarded as a simple error correcting encoding to facilitate 
iterative solutions. In this light, TH-IR signaling can be considered as a concatenated coding system, 
where the inner code involves the modulation of a UWB pulse, and the outer code is a repetition codq^. 
This structure is similar to BICM, for which modulation and channel coding comprise the inner and outer 
codes, respectively [2], [6]. 

Consideration of TH-IR systems as BICM systems facilitates the design of the pulse-symbol iterative 

detector, which is composed of two stages [9]. The first stage is denoted as the "pulse detector", while the 

second stage is denoted as the "symbol detector", and the detector iterates between these stages. In the 

first stage, it is assumed that different pulses from the same user correspond to independent information 

symbols, while in the second stage the information that several pulses from the same user correspond to 

the same information symbols is exploited. The second stage acts effectively as a decoder. 

^Unlike conventional turbo receivers, tiiere is not a separate interleaver unit between the coding units in the proposed structure. 
However, the function of an interleaver in reducing the correlation between the soft output of each decoder unit and the input 
data sequence (called the iterative decoding suitability criterion [17], [25]) is performed by the TH and polarity randomization 
codes in the proposed system. By means of TH and polarity codes [11], inputs to the demodulator and the decoder blocks 
become essentially independent. 



A. The Pulse Detector 

Denote by 6^ the information symbol carried by the jth pulse of the /cth user. Note that although we 
know a priori that b^^^Dj^ +! = ■•• = ^iW ^^^ every k = I, . . . ,K and i = 1, . . . ,P, this information 
will be ignored by the pulse detector. As such, at the nth iteration the pulse detector computes the a 
posteriori log-likelihood ratio (LLR) of fe^, given f^ in (|3]l, the information about the transmitted pulses 
from other users and the a priori information about 6^ provided by the symbol detector, as 



LWj) = log ri ^ = ^°§ / , , T + ^°S 7^^ —. (4) 

Pr [h] = - 1 1 r^^j f\^j\^3=-^) P^ (^i = - 1; 



for J = 1, ... , PNf and A; = 1, . . . , ii', where / ( f^|6^ = i j is the likelihood of the jth combined sample 
corresponding to the fcth user given that the transmitted symbol was i G ±1. It is seen that the a posteriori 
LLR is the sum of the a priori LLR of the transmitted symbol, log /,^_ '^ = \^ {bj), and the extrinsic 

f(f''\b''=l) A I 

information provided by the pulse detector about the transmitted symbol, log /-J, /_ — -r = ^i{bj) [9]. 

We first consider the computation of log / ( r^|6^ ) in (|4l). From ([2]), it is easy to deduce the following 

model for r^„^, which is the received sample from the ?7ith path of the kth user's signal in the jth frame: 

K NfP-1 
5=1 a=0 

where l{j, k, m) is the arrival time of the jth pulse of the /cth user via the ?TT,th path, that is /(j, k, m) = 
(j — \)Nc + c^ + /^; [H];(j fc ,„). is the l{j, k, m)th row of H; [8^]^,/ is the (/c, /)th element of the matrix 
Sm', and nuj^j.^^\ is the /(j, k, m)t\\ element of the noise vector, n. This model can be simplified further 
by noting that the vast majority of the summands in ([5]) are zero. Let A denote the set of distinctive 
(g, a) pairs in the right-hand-side (RHS) of ([5]) such that the corresponding element in the double sum 
is not zero; i.eQ 

where /C = {1, . . . , K\ and JT = {0, . . . , P^ ^ — 1}. If K^rn represents the number of summands in ^ 
that are different from zero, A consists of K^a^ pairs. Note that the pair (k^2) is always in A\ hence, 
^^m — 1 fo'" every j, k and m. Assume, without loss of generality, that the pair (A;, j) is the first element 
'*Note that the dependence of A on j, k and m is not shown exphcitly for notational simplicity. 



of the set A. 

Let q{i) and a{i) represent, respectively, the first and the second components of the ith pair in set A 
for i = 1, . . . , K^,„. Then, ([S]) can be further simplified as follows: 



' j,m 



^'™^i [^'^IjJVe+cJ.Lj/Af/J + ^j,'m\*j,m + ni(j,k,m), 



(7) 



where h*^^ 



[s 



,9(2) 



?(^L.) 



9(2)J a{2)N,+cl%]M^)/Nf\ \j,k,m.)-a{2)N^-cl{l] ' 



'(-^^) 



From ([3]) and ([7]), f^ can be expressed as 



and hl^ 



V2)'---' a{^/f„) 



T 



M 



Abj + 2^ hik^hj^^hj,^ + 



^'' 



(8) 



m=l 



where A = [Sk]jNr+c^,[j/Nf}J2m=i[ht>;^) ' and h'^ = Em=i ^/I'^/O.fc.m). which is distributed as 
Based on (|8]), the log-likelihood of fj given bj is, 

2' 

} Pr(b), (9) 

bG{±l}' 



iogf(f^m = c+\og 






where C is a constant independent of j and /c, b is a vector comprised of the distinct 6jj 's in b^ ^ , . . . , b^ j^ , 
and Kj is the size of b. Note that Kj represents the total number of pulses that have at least one multipath 
component arriving at the receiver at the same time as one of the sampled signal paths originating from 
the jth pulse of the kth user. Also note that for a given value of b, b^',^ in Q is uniquely defined, 
and Pr(b) is the a priori probability, which is obtained from the extrinsic information provided by 
the symbol detector. Since the extrinsic information from the symbol detector is the following LLR, 



\n—l 
^2 



[9] 



log 



Pr(fe^=l) 



Pr(6', 



pr [cf. (fT2]) l. it can be shown, with the aid of some algebraic manipulations, that 






2i^? 



2 = 1 



l + [b],tanh -ArMWi) 



(10) 
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From Q and ( fTOl ). the a priori LLR of h^- can be written as follows: 
/ (r^V^ = l) 



log 



/( 






L ^ a;- (.?) 



E6„«,»S- .--('•-^-^"""■^-"^"J n£ [1 + [6|.tanh (IXr ([b].))] 



E,,,,„,,. e-*('---E''„.r,%^.%^.)- nS [1 + [bl.ta„h (JAS- ([b].))] ' 

From ([TT]) and (JUl, it is observed that the a posteriori LLR is given by the sum of the prior information 
obtained from the symbol detector and the extrinsic information. 

B. The Symbol Detector 

The symbol detector exploits the fact that h^u_^-,j^ ,^ = • • • = 6^^ for every k = 1, . . . , A' and 
i = 1, . . . , P. Therefore, the symbol detector computes the a posteriori LLR of 6^ given the extrinsic 
information from the pulse detector, and given &^__^-,jy , ^^ = • • • = h^^ , for every k = 1,. . . ,K and 
i = 1, . . . , P. It can be shown that this LLR has the following general structure [9]: 

Pr (b'y = l\{X'l{b'')}^Z'lt^ ; constraints on pulses') ^/ L(i-i)/^/J+^/ 

^2 (&,') = log -^ -^T^n^ T = E ^1 (^') +^1 (^') 

Pr ( 6| = -l|{AJ(6|)}^.^i^;^^;constramts on pulses] ,=Nf[{j-i)/Nf\+i,i^j 

^ V ' 

(12) 

where the constraints are b'^^_-^^J^ .-^ = ■ ■ ■ = yy^ , for every k = 1, . . . ,K and i = 1, . . . , P. In (IT2l ). the 
a posteriori LLR at the output of the symbol detector is expressed as the sum of the prior information 
from the pulse detector, A"(6^'), and the extrinsic information about 6^, denoted by Xq^Q^^j)- This extrinsic 
information is obtained from the information about all the pulses except the jth pulse of the kt\\ user. In 
the next iteration this information is fed back to the pulse detector as a priori information about the jth 
pulse of the /cth user. 

Note that the structure of the pulse-symbol detector is similar to the joint-over-antenna turbo receiver 
in [18], which employs multiple turbo loops for each antenna, by considering "composite" modulation 
for multiple antennas as the inner code, and channel coding for different users as the outer code. The 
main differences are that, for the pulse-symbol detector, the outer code is a simple repetition code, 
while the inner code is a binary phase shift keying modulation, and that there are also TH and polarity 
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randomization operations in the pulse-symbol detector, which randomize the positions and the polarities 
of the pulses in different frames. 

C. Complexity 

It is easily seen that computing Ai ( 6^ ) of (ITTI ) is the most complex task in the pulse-symbol 
detector. The complexity of computing Ai (6^") is exponential in the total number K'- of pulses that 
have at least one multipath component arriving at the receiver at the same time as one of the sampled 
signal paths originating from the jth pulse of the kth user. That is, as can be observed from (ITTI ). 
the complexity of computing Ai ( 6^ j is O ( 2^j J . Since there are Nf pulses per symbol per user, the 
complexity of one iteration per symbol per user is easily seen to be O f Yli=i ^^^^ ) = ^ (2^^^)), where 
Y{K) = maxj=i ... AT^ K''. Denoting by A'^i the number of iterations made by the pulse-symbol detector, 
the complexity of the pulse-symbol detector is O (A'i2^(^)) per symbol per user. 

Kj is a random variable depending on the channel impulse response, the TH sequence, and the 
number of users in the system. It is hard to compare the complexity of the pulse-symbol detector, which 
is random, with the complexity of multiuser detection algorithms that have fixed complexity, e.g., the 
optimal detector. Nevertheless, if, for example, the probability of the event Ni2'^^^'^^ > 2^ is very low, 
then, roughly speaking, the proposed algorithm is simpler than the optimal detector. 

The exact distribution of Y{K) is very complicated, and moreover, this distribution depends on the exact 
channel structure, the number of paths arriving at the receiver, and the TH sequences. In what follows, 
numerical examples are used to demonstrate the complexity of the pulse-symbol detector. In particular, 
consider a system with 20 users, each transmitting at rate of 2 MBits/sec over a 0.5 GHz UWB indoor 
channel [7]. The receiver is sampling the first 10 multipath components; i.e., £ = {1, 2, ... , 10}. Figure 
|2] depicts the empirical cumulative distribution function (CDF) of Y{K), averaged over 100 different 
channel realizations from the channel model 1 (CM-1) of the IEEE 802.15.3a channel model, for systems 
transmitting one, five and twenty pulses per symbols {Nf = 1, 5, 20). It is clear that the complexity of 
the pulse-symbol detector decreases as the pulse rate, Nf, decreases. This is expected because, as the 
pulse rate decreases, the probability of collisions decreases as well, which reduces the complexity of the 
pulse-symbol detector. Nevertheless, the complexity of the pulse-symbol detector can be large even for 
moderate numbers of pulses per symbol. In the next section, two low-complexity implementations are 
presented. 
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IV. Low Complexity Implementations 

The complexity of the pulse-symbol detector varies considerably with the system pulse rate, Nf. An 
increase in the pulse rate increases the algorithm complexity, and this complexity can be large even 
for moderate pulse rates or numbers of users. In what follows two low complexity implementations are 
described. The first one is based on approximating part of the multiple access interference (MAI) by a 
Gaussian random variable, while the second one is based on soft interference cancellation. 

A. Low -Complexity Implementation: The Gaussian Approximation Approach 

The high complexity of the pulse-symbol detector is due solely to the pulse detector where the a priori 
LLR of a received sample given the transmitted symbol, Ai(6'^), is computed. In recent studies (see, [3], 
[29], [34], [7], and references therein), UWB channels are commonly characterized as multipath channels 
with large numbers of paths, and delay spreads of up to a few tens of nanoseconds. These large delay 
spreads are equivalent to discrete-time channels having more than one hundred taps. Although the UWB 
channel consists of many taps, most of them are weak compared with the strongest tap, and only about 
five to ten taps are weaker by no more than 10 dB than the strongest tap. Therefore, most of the pulses 
colliding with the pulse of interest arrive via weak paths. 

In order to reduce the complexity of the pulse-symbol detector, we propose to model the MAI resulting 
from the pulses arriving via weak paths by a Gaussian random variable. Recall that /i{i is the gain of the 
mth path, through which the pulse of interest arrives at the receiver. In order to reduce the complexity 
of computing A" ( 6^ ) , the receiver sets a threshold T (in dB) and all the pulses colliding with the pulse 
of interest are divided into two groups. The first group contains all the pulses that collide with the pulse 
of interest and that arrive via paths that are weaker than the ?n,th path of user k by no more than T dB 



(i.e., each path has an amplitude of at least 10 log 



10 



hf. 



T dB). The second group contains all the 



pulses that collide with the pulse of interest and that arrive via paths that are weaker than hf^ by more 
than T dB. Denote by /|^'„^ and I^^ the indices of the pulses belonging to the first and second group, 
respectively; that is. 






10 log 



10 



riik 



10 log 



10 



l(j,k,m)-a{i)N^^-cf'l 



<T,i = 2,...,K^^„A, (13) 



and similarly define /j^„^. 
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A model for r^^ can be written in terms of L^,^ and /,^„, as follows: 
J, III jiiii jiiii 
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where the first term on the RHS represents the part of the received signal resulting from the pulse of 
interest, the second term on the RHS represents that part of the MAI resulting from strong interference, 
the third term on the RHS represents that part of the MAI resulting from weak interference, and the 
fourth term on the RHS represents the additive Gaussian noise. Since most of the paths are considerably 
weaker than the main path, it is expected that \I^^\ » \I^m\- As such, the third term on the RHS 
of (fT4l ) is the sum of a large number of random variables and we propose to model this sum as a 
Gaussian random variable. The mean and the variance of the third term on the RHS of (fT4l ) are zero and 
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, respectively. Thus we use the following approximation: 
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Approximating the part of the MAI corresponding to weak pulses colliding with the pulse of interest 
by a Gaussian random variable results in the following approximate model for r^^: 



= hiJ3j \^k]jN^+c),Yj/Ni-\ + ^i,m^j,m + f^j,mi (16) 



where n'?^ is a zero mean Gaussian random variable with variance {(T^„^Y — '^n+SieP 
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ing the same derivations leading to (fTTI) and (IT6l ). the a priori log-Ukelihood ratio of fj = ^^^=1 ^f* '^'^ 



and h^^ 






3,rn 



14 



given b'j is then approximated by, 

/ (rm = 1 



m^-^j^;^- 



log' 



3 I J 



Y:..,^^^^-''-'^-''^''-''-^ Ut^, l + [b].tanh iArW[b]. 



E,,^,,^., ^^(S'-'-^-^^'^'^'^ nS [l + [b]. tanh (lAr^ ([6], 

wherei = [Sfc]jjv.+cj,b7JV,J Em=i (^J , ^^ is the variance of ^^^i /i|n^^„„ which is E™=i l^f^PKm)^ 
b is a vector comprised of the distinct foj^'s in h^ ^, . . . , b^j^j, and ^j^ is the size of b. 

The proposed low complexity implementation computes the approximate a priori log-likelihood ratios, 
< A" f 6j" j > , instead of the exact a priori log-UkeUhood ratios. The symbol detector uses these approximate 
LLRs as the extrinsic information, and it computes a new set of extrinsic information variables, {A2 (&^)}, 
based on the approximate LLRs provided by the pulse detector. The algorithm continues to iterate between 
the two stages until convergence is reached. 

The complexity of the proposed scheme depends on the exact number of strong pulses colliding 
with the pulse of interest, which is again a random variable. It is easily seen that the complexity of 
this implementation is O (2^(^M, where Y{K) = maxj=i ... at^, i^^. Again, we resort to a numerical 
example in order to demonstrate the complexity of the proposed detector. Consider a system having 20 
users, each transmitting at a rate of 2 MBits/sec over a 0.5 GHz UWB indoor channel [7]. The receiver 
is sampling the first 10 multipath components; i.e., £ = {1,2,..., 10}, and the threshold T is set to 3 
dB. Figure |3] depicts the empirical CDF of Y{K), averaged over 100 different channel realizations from 
the channel model 1 (CM-1) of the IEEE 802.15.3a channel model, for systems transmitting one, five 
and twenty pulses per symbols (Nf = 1, 5, 20). By comparing Figure |2] and Figure |3j the reduction in the 
complexity compared with the complexity of the pulse-symbol detector can be observed. In Figure IH the 
empirical CDF is plotted for Nf = 5 and various threshold values. It is observed that as the threshold is 
decreased, fewer collisions are considered as strong ones, which reduces the complexity of the algorithm. 

Using the same approach, there are other ways of reducing the complexity of the pulse-symbol detector. 
For example, one can divide the received pulses into two groups based on their relative strengths. In this 
approach, a threshold 6 will be set in advance, and the MAI caused by all but the 6 strongest colliding 
pulses will be modelled as a Gaussian random variable. In this approach the complexity of the receiver 
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is limited by A'^^2'' per symbol per user. 

B. Low-Complexity Implementation: The Soft Interference Cancellation Approach 

The complexity of the low-complexity implementation presented in the previous subsection might still 
be high for large numbers of users or pulse rates. As such, an even simpler implementation method is 
required. In what follows a very low complexity implementation based on soft interference cancellation 
is presented. 

Recall that the most complex task in the pulse-symbol detector is the computation of the a priori 



log-likelihood ratio of the received sample given the transmitted pulse, Ai ( fe'? ) = log 



f{f';\b']=i) 



Our 



aim is to find a simple way to approximate Ai [bj\, and soft-interference cancellation provides us 



with such a method [16], [19]. Recall that the model for f^ is given by f^ = ^ 



^ii^^.m, where 



^i,m = h% ^i[^fc]jAfc+c^b7^/J +^i,m^j.m + '^Hj=fc.m)- ^^ soft-intcrferencc cancellation methods, the first 
step is to form a soft estimate of h^j „^. This soft estimate is the conditional mean of b^„j based on our 
current knowledge. We denote this soft estimate by b^^ = E<^ b^^|{A2 f^j )} [> which is given by 
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(18) 



Assuming that this soft estimate is reliable, the remodulated signal h^' b^_ is subtracted from rj^. 

<^ t^ J^lft J ■,111' Jii 

resulting in 
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Subtracting the remodulated signal from r^^ results in the reduction of the MAI. Since the number 



of collisions is large, the remaining MAI, hj ^ ( ^j m ~ ^j m 
approximated by a Gaussian random variable, as follows: 
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Then, the soft estimate for f^ can be obtained as 



M 






m=l ['H'' 



Y.A 



J,m 



~Ah)^h), 



(23) 



m=l 
2 



, and h) = ^*f^^ hf.^n^^, with n^^ = ^^ (b^^^, - b^^^,) + 



where ^ = [Sk]jN^+c^,[j/Nf\ Er 

'^l{j,k,m)- 

In the proposed very low-complexity implementation of the pulse-symbol algorithm, the pulse detector 
computes the a priori log-likelihood ratio of r^ given the transmitted symbol, instead of the a priori 
log-likelihood ratio of f^ given the transmitted symbol. Denote by A" ( 6^ j this log-likelihood ratio; that 

~ / i,\ A f(r''\b''=l) 

is. A" [bjj= log f(=k,i,k'_i\ ■ By using the Gaussian approximation for the residual MAI as shown in 
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As in the previously proposed low complexity implementation, the pulse detector computes the a 
priori log-likelihood ratios, "^ ^i (^f^j k instead of the exact a priori log-likelihood ratios. The symbol 
detector uses these approximated LLRs as its extrinsic information, and it computes a new set of extrinsic 
information, {Ag (&^)}, based on the approximated LLRs provided by the pulse detector. The algorithm 
then continues to iterate between the two stages until convergence is reached. 



V. Simulations 

In this section, simulation results are presented in order to investigate the performance of various 
receiver structures as a function of the signal-to-noise ratio (SNR). The UWB indoor channel model 
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reported by the IEEE 802.15.3a task group is used for generating UWB multipath channels [7], and 
the uplink of a synchronous TH-IR system with Nj = 5, Nc = 250, and a bandwidth of 0.5 GHz is 
considered. It is assumed that there is no inter-frame interference (IFI) in the systerqj Note, however, 
that the analysis in Section III and IV cover scenarios with IFI, as well. 

In Figure [5j bit error rates (BERs) of various receivers are plotted as functions of the SNR using 
100 realizations of CM-1 [7]. There are 5 users in the environment (K = 5), where the first user is 
assumed to be the user of interest. Each interfering user is modeled to have 10 dB more power than the 
user of interest so that an MAI-limited scenario can be investigated. Note that the benefits of iterative 
multiuser detectors become more obvious in the MAI-limited regime. At all the receivers, the first 25 
multipath components are employed; i.e., £^ = {1, . . . , 25}. In the figure, the curve labeled "MRC-Rake" 
corresponds to the performance of a conventional MRC-Rake receiver [4]; the curves labeled "LC" 
correspond to the performance of the low complexity implementation method based on the Gaussian 
approximation (T = 10 dB is used); and the curves labeled "SIC" correspond to the performance of 
the low complexity implementation method based on soft interference cancellation. Also, the single user 
bound is plotted for an MRC-Rake receiver in the absence of interfering users. From the figure, it is 
observed that the BERs of the proposed detectors are considerably lower than those of the MRC-Rake. 
In addition, after two iterations, the performance of the proposed receivers gets very close to that of a 
single user system. Finally, the low complexity implementation based on the Gaussian approximation out- 
performs the low complexity implementation based on soft interference cancellation on the first iteration, 
which is a price paid for the lower complexity of the latter algorithm. In other words, the soft interference 
approach estimates the overall MAI by first order moments, and approximates the difference between the 
MAI and the MAI estimate by Gaussian random variables, which reduces the complexity significantly 
but also causes a performance loss due to a more extensive Gaussian approximation compared to the 
low complexity implementation that uses Gaussian approximations only for weak MAI terms. However, 
after two iterations, both receivers get very close to the single-user bound, and the low complexity 
implementation based on soft interference cancellation becomes more advantageous due to its lower 
computation complexity (cf. Figure |7]). 

In Figure [6l the same parameters as in the previous case are used, and performance of the low 
^TH codes are generated randomly from {0,1, ..., Nc — L — 1} in order not to cause any IFI. 
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complexity implementation based on the Gaussian approximation is investigated for various threshold 
values. As can be observed from the plot, as the threshold is decreased; i.e., as more MAI terms are 
approximated by Gaussian random variables, the performance of the algorithm degrades. In other words, 
there is a tradeoff between performance and complexity as expected from the study in Section IIV-AI 
Also note that since each interfering user is 10 dB stronger than the user of interest, there is not much 
difference between the T = 10 dB and T = dB cases (as most of the significant MAI terms are usually 
above the threshold in both cases), whereas the performance degrades significantly for the T = — 10 dB 
case. 

Next, the performance of the receivers is investigated for CM-3 of the IEEE 802.15.3a channel model, 
where T = dB is used for the low complexity implementation based on the Gaussian approximatioio 
The same observations as in Figure [5] are made. The main difference in this case is the increase in the 
BERs, which is a result of the larger channel delay spread of the channel model used in the simulations. 
In other words, less energy is collected on the average, which results in an increase in average BERs. 

In order to compare the performance of the proposed receivers under computational constraints, the 
performance loss (in dB) of each receiver compared to a single user receiver is plotted versus the 
average number of multiplication operations per user in Figure |7] The performance loss is calculated 
as the difference between the SNR needed for the receiver to achieve a BER of 10^'^ and the SNR 
of the single user receiver at BER=10~'^. For each receiver, the points on the curve are obtained for 
1, 2 and 3 iterations. From Figure |7J it is concluded that the low complexity implementation based on 
soft interference cancellation provides a better performance-complexity tradeoff than the low complexity 
implementation based on the Gaussian approximation. 

Finally, the performance of the receivers that are sampling only the first 5 multipath components (i.e., 
C} = {1, 2, 3, 4, 5}) is investigated. In this case, it is observed from Figure [8] that the proposed receivers 
can still perform very closely to the single-user bound, whereas the MRC-Rake receiver experiences a 
serious error floor. 

VI. Summary and Concluding Remarks 

In this paper an iterative approach, the pulse-symbol detector, for multiuser detection in TH-IR systems 
has been presented for frequency-selective environments. In this approach, the detection problem is 
*The curves are very similar to the ones in Figure [5] hence they are not shown separately. 
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divided, artificially, into two parts, and the proposed algorithm iterates between these two parts. In each 
iteration, the algorithm passes extrinsic information between the two parts, resulting in an increase in 
the accuracy of the decisions made by the detector. The complexity of the proposed detector is random; 
hence, comparing the complexity of this detector with other fixed complexity algorithms is complicated. 
Nevertheless, we have demonstrated, via simulations, that there are scenarios were the complexity of the 
proposed detector is lower than the complexity of the optimal detector, while in others it is higher. 

In addition, two low-complexity implementations have been presented. The first implementation is 
based on approximating parts of the MAI by a Gaussian random variable and the second is based on 
soft interference cancellation. The complexity of both implementations is quite low, and we believe 
that these algorithms could be used in practical systems. The performance characteristics of these low- 
complexity implementations have been examined using simulations. We have shown that these algorithms 
typically get very close to the single-user bound after only a few iterations, and outperform the MRC-Rake 
substantially. 

The proposed multiuser detection algorithms were described under the assumption of synchronous 
users. However, it is easily seen that this assumption was made only for notational simplicity. The pulse 
detector inherently ignores any information about the symbols and their structure, and in particular their 
timing. It uses only the information about the individual pulses that collide with the pulse of interest. The 
symbol detector uses the results of the pulse detector for pulses that correspond to the symbol of interest. 
As such, the symbol detector is independent of the other symbols from the same user or from the symbols 
from other users. In summary, it is evident that synchronization among users is not required. Moreover, 
it is easy to design a serialized version of the proposed algorithm in the sense that the receiver process 
on-the-fly new samples at the expense of performance degradation. In summary, the only requirement 
from the receiver is the knowledge of each user's symbol timing, which is commonly obtained during 
synchronization phases in practical systems. 
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Fig. 1. The general structure of the receiver, where Prx(i) denotes the received UWB pulse. 



1 
0.9 
0.8 
0.7 
0.6 



§0.5 



0.4 
0.3 
0.2 
0.1 






10 
Y(K) 



15 



20 



Fig. 2. CDF of maxj=i,...,jv, K^ for various pulse rates. 
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Fig. 3. CDF of maxj^i Nf Kj for various pulse rates and T = 3 dB. 
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Fig. 5. BER as a function of the SNR for various receivers. 
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Fig. 6. BER as a function of the SNR for various receivers, where the Gaussian approximation technique is plotted for various 
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